Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison

نویسندگان

  • Christos Faloutsos
  • Raphael Chan
چکیده

High capacity disks, especially optical ones, are commercially available. These disks are ideal for archiving large text data bases. In this work, we examine efficient searching techniques for such applications. We propose a unifying framework, which reveals the similarities between signature files and an inverted file using a hash table. Then, we design methods that combine the ease of insertion of the signature files with the fast retrieval of the inverted files. We develop analytical models for their performance and we verify it through experimentation on a 2.8 Mb data base. The agreement between theory and experimentation is very good. The results show that the proposed methods achieve fast retrieval, they require a modest 10%-30% space overhead, (as opposed to 50%300% overhead [13] for the inverted files), and they do not require re-writing; thus, they can handle insertions easily, they permit searches during an insertion and they can be used with write-once optical disks. Using our verified model, the performance predictions for the proposed methods on large data bases (e.g., 250 Mb) are very promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Text Access Methods for Optical and Large Magnetic Disks: Design and Performance Comparison

High capacity disks, especially optical ones, are commercially available. These disks are ideal for archiving large text data bases. In this work, we examine efficient searching techniques for such applications. We propose a unifying framework, which reveals the similarities between signature files and an inverted file using a hash table. Then, we design methods that combine the ease of inserti...

متن کامل

Archiving Techniques for Temporal Databases

This paper describes archiving strategies for append-only temporal databases. We present a storage architecture where optical disks work in tandem with magnetic disks. Magnetic disks are used for storing current versions and recent past versions, whereas optical disks are dedicated for archiving older past versions. Similarly, temporal access structures are stored on both magnetic and optical d...

متن کامل

Parallel file striping on optical jukebox servers

In the near future, large digital media servers are expected to offer storage capacities in the order of petabytes. Servers made of clusters of PC's connected to jukeboxes may represent an interesting alternative compared with servers made of arrays of magnetic disks. However, due to disk exchange overhead, higher seek times and lower data transfer rates, access to data located on optical disks...

متن کامل

Challenges for Tertiary Storage in Multimedia Servers

The low cost per megabyte of optical disk and magnetic tape storage make these technologies particularly attractive for use in large capacity storage servers, including multimedia servers. However, these devices have performance problems that range from high costs for many optical drives to low performance and lack of random access in tape drives. We evaluate the performance on multimedia appli...

متن کامل

Optimized Binary Search and Text Retrieval 1 Eduardo

We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted access to the data is obtained through an array of pointers to the raw data. One immediate application of this algorithm is to improve the retrieval performance of disk databases that are indexed using the suux ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988